Analyzing Web Site for Translation

ABSTRACT

A system, method and computer readable medium for synchronizing web content is disclosed. The method includes retrieving a first web content in a first language from a web site, the first web content corresponding to a second web content wherein the second web content is a translation in a second language of the first web content. The method further includes dividing the first web content into a plurality of translatable components and generating a unique identifier for each of the plurality of translatable components. The method further includes matching each of the plurality of translatable components to a plurality of translated components of the second web content using the unique identifier of each of the plurality of translatable components. If a translatable component is not matched to a translated component, the method further includes designating the translatable component for translation into the second language.

CROSS-REFERENCE TO RELATED APPLICATIONS

This non-provisional application is a continuation of U.S. patentapplication Ser. No. 13/933,815 filed Jul. 2, 2013, which is acontinuation of U.S. patent application Ser. No. 12/609,834, entitled“ANALYZING WEB SITE FOR TRANSLATION”, filed on Oct. 30, 2009, which is acontinuation of U.S. patent application Ser. No. 10/784,334, entitled“ANALYZING WEB SITE FOR TRANSLATION”, filed on Feb. 23, 2004, which is acontinuation-in-part of the provisional patent application Ser. No.60/449,571 with inventors Enrique Travieso, Adam Rubenstein, and WilliamFleming and entitled “TRANSLATION SYSTEM ARCHITECTURE” filed Feb. 21,2003, and commonly assigned herewith to Motionpoint Corporation, whichis hereby incorporated by reference in its entirety. Thisnon-provisional application is further related to non-provisional patentapplication Ser. No. 10/784,727 with inventors Enrique Travieso and AdamRubenstein, entitled “DYNAMIC LANGUAGE TRANSLATION OF WEB SITE CONTENT;”to non-provisional patent application Ser. No. 10/784,726 with inventorsEnrique Travieso, Adam Rubenstein, Arcadio Andrade and Collin Birdsey,entitled “AUTOMATION TOOL FOR WEB SITE CONTENT LANGUAGE TRANSLATION;”and to non-provisional patent application Ser. No. 10/784,868 withinventors Enrique Travieso, and Adam Rubenstein, entitled“SYNCHRONIZATION OF WEB SITE CONTENT BETWEEN LANGUAGES,” all of whichwere filed on Feb. 23, 2004. The entire teaching and contents of theserelated applications are hereby incorporated herein by reference.

FIELD OF THE INVENTION

The present invention generally relates to web sites, and moreparticularly relates to dynamic translation of web site content toanother language.

BACKGROUND OF THE INVENTION

The Internet and the world-wide web has allowed consumers to completebusiness transactions with organizations located across continents fromthe comfort of their own desk. In an increasingly global marketplace, itis becoming imperative for organizations to provide web site content inmultiple languages in order to expand their customer base beyond theorganization's home country. In addition, as the demographics of acountry change to include foreign language speakers, it is increasinglyimportant to communicate with those customers and potential customers intheir native language. For example, several large U.S. retailers haveannounced that serving the Hispanic segment is now a very high priority.Some U.S. retailers have even hired Hispanic ad agencies to startmarketing to the Hispanic market in their native language—Spanish.

Currently, an organization that wants to translate its web site toanother language can choose from several techniques, each havingsignificant drawbacks. One technique involves purchasing machinetranslation technology. Machine translation is sometimes useful to get arough idea of the meaning of the content in a web site, but it is farfrom ideal. For most organizations, this type of translation, althoughconvenient, is not practical because the quality of the translation issimply not good enough to be posted on their web sites.

Another technique involves managing the translation process by deployinghuman translators and either maintaining multiple web sites for eachlanguage, or re-architecting the existing web site back-end technologyto accommodate multiple languages. This requires significant resourcesin terms of time and cost, including a high level of complexity andduplication of effort. Dynamic and e-commerce sites present additionalchallenges, as the information to be translated resides in multipleplaces (e.g., a Structured Query Language database, static Hyper TextMarkup Language pages and dynamic Hyper Text Markup Language pagetemplates) and each translated site must interface with the samee-commerce or back-end engine. Further, as the web site changes, ongoingmaintenance must also be handled. This approach will yield vastlysuperior translations that are suitable for professional web sites oflarge organizations, but at great cost. Most organizations simply do nothave, or do not want to invest in, the resources necessary to handlethis task internally.

For example, FIG. 1 is a block diagram illustrating the systemarchitecture of a conventional web site. The web site of FIG. 1 ispresented in a first language, such as English. FIG. 1 shows a webserver 112 connected to the Internet 116 via a web connection. A publicuser 118, such as a person using a computer with a web connection, canaccess the web server 112 via the Internet 116 and download information,such as a web page 114, from the web server 112 for viewing. The webserver 112 is operated by programming logic 110, consisting ofinstructions on how to retrieve, serve, and accept information forprocessing. The web server 112 further has access to a database 102 ofinformation, as well as Hyper Text Markup Language (HTML) template files104, graphics files 106 and multimedia files 108, all of whichconstitute the web site served by web server 112.

FIG. 2 is a block diagram illustrating the system architecture of aconventional web site presented in two languages. The web site of FIG. 2is presented in a first language, such as English (as shown above forFIG. 1) and in a second language, such as Spanish. FIG. 2 shows the webserver 112 and the other English language components described in FIG.1, including the database 102 of information, the HTML template files104, graphics files 106, multimedia files 108 and programming logic 110.FIG. 2 further shows the public user 118 accessing the web server 112via the Internet 116 and downloading information, such as a web page 202in the English or Spanish language.

FIG. 2 also shows the Spanish language components 204 of the web site,including the database 208 of information, the HTML template files 214,graphics files 216, multimedia files 210 and programming logic 212. Theaforementioned Spanish language components are managed by amulti-lingual content manager 206, which manages requests forinformation in the dual languages. FIG. 2 further shows that the webserver 112 must be re-engineered to serve multiple sets of content indifferent languages.

As can be seen in the difference between FIG. 1 and FIG. 2, thedeployment of the Spanish language components 204 of FIG. 2 requires asignificant expenditure of time and resources. Further, the deploymentrequires there-engineering of the web server 112, adding to the time andcost associated with the deployment. Additionally, once the Spanishlanguage components 204 have been established, they must be keptsynchronized with the English language components, resulting in arecurring cost. This is disadvantageous, as most organizations simply donot have the resources necessary to perform this task.

Therefore a need exists to overcome the problems with the prior art asdiscussed above.

SUMMARY OF THE INVENTION

Briefly, in accordance with the present invention, disclosed is asystem, method and computer readable medium for synchronizing webcontent. In an embodiment of the present invention, the method on aninformation processing system includes retrieving a first web content ina first language from a web site, the first web content corresponding toa second web content wherein the second web content is a translation ina second language of the first web content. The method further includesdividing the first web content into a plurality of translatablecomponents and generating a unique identifier for each of the pluralityof translatable components of the first web content. The method furtherincludes matching each of the plurality of translatable components ofthe first web content to a plurality of translated components of thesecond web content using the unique identifier of each of the pluralityof translatable components of the first web content. If a translatablecomponent of the first web content is not matched to a translatedcomponent of the second web content, the method further includesdesignating the translatable component of the first web content fortranslation into the second language.

Also disclosed is a web server for synchronizing web content. The webserver includes a web connection for retrieving a first web content in afirst language from a web site, the first web content corresponding to asecond web content wherein the second web content is a translation in asecond language of the first web content. The web server furtherincludes a processor for dividing the first web content into a pluralityof translatable components and generating a unique identifier for eachof the plurality of translatable components of the first web content.The processor further for matching each of the plurality of translatablecomponents of the first web content to a plurality of translatedcomponents of the second web content using the unique identifier of eachof the plurality of translatable components of the first web content. Ifa translatable component of the first web content is not matched to atranslated component of the second web content, the processor designatesthe translatable component of the first web content for translation intothe second language.

Also disclosed is a computer program product including computerinstructions for synchronizing web content. In an embodiment of thepresent invention, the computer instructions include instructions on aninformation processing system for retrieving a first web content in afirst language from a web site, the first web content corresponding to asecond web content wherein the second web content is a translation in asecond language of the first web content. The computer instructionsfurther include instructions for dividing the first web content into aplurality of translatable components and generating a unique identifierfor each of the plurality of translatable components of the first webcontent. The computer instructions further include instructions formatching each of the plurality of translatable components of the firstweb content to a plurality of translated components of the second webcontent using the unique identifier of each of the plurality oftranslatable components of the first web content. If a translatablecomponent of the first web content is not matched to a translatedcomponent of the second web content, the computer instructions furtherinclude instructions for designating the translatable component of thefirst web content for translation into the second language.

The preferred embodiments of the present invention are advantageousbecause of the ease of implementation of the disclosed systems. Asdiscussed below, the present invention allows for the deployment of acorresponding web site in another language with a reduced amount ofconfiguring of the original web site. This reduces the amount ofinformation Technology (IT) resources that must be consumed by theproviders of the original web site and reduces the amount of timenecessary for deployment. Also as discussed below, only a single link isrequired to be deployed on the original web site in order to provideaccess to the corresponding web site in another language. This isbeneficial as it reduces the amount of time and effort that must beexpended by the providers of the original web site in order to releasethe corresponding web site in another language.

The present invention is further advantageous because it allows for theuse of human translation, thereby producing a high quality translationof the original web site in another language. This is beneficial as itreduces or avoids the use machine translation, which can be of lowquality. Additionally, the present invention preserves the formatting ofthe original web site, including when a translation is of a larger sizeor length that the original text. This is beneficial as it allows forthe preservation of the look and feel of the original web site, therebyallowing users to maintain familiarity with the corresponding web sitein another language.

The present invention is further advantageous because it supports large,complex and rapidly-changing web sites. As explained in greater detailbelow, the present invention supports web sites with any number of webpages, links, downloads and other materials, thereby allowing forgreater flexibility and usability of the present invention. The presentinvention also supports web sites that change continuously orperiodically, as it regularly polls the web site to discern changes andinitiate corresponding translations. This is beneficial as it reducesthe amount of time and effort that is expended on the maintenance of acorresponding web site in another language.

The present invention is further advantageous because it provides acorresponding web site in a second language, thereby meeting the needsof customers speaking the second language. This is beneficial as itgenerates traffic consisting of customers speaking the second languageand provides customers speaking the second language a self-servicee-commerce option. This is also beneficial because it provides moreaccessible shopping opportunities for customers in the second languageand provides a more user-friendly environment for these clients in thesecond language.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating the system architecture of aconventional web site.

FIG. 2 is a block diagram illustrating the system architecture of aconventional web site presented in two languages.

FIG. 3 is a block diagram illustrating the system architecture of a website presented in two languages, in one embodiment of the presentinvention.

FIG. 4 is a block diagram illustrating the system architecture of thepresent invention, in one embodiment of the present invention.

FIG. 5 is an operational flow diagram depicting the process of thetranslation server, according to a preferred embodiment of the presentinvention.

FIGS. 6A-6C illustrate an operational flow diagram depicting the servingprocess of the translation server, according to a preferred embodimentof the present invention.

FIG. 7 is a block diagram depicting the serving process in an ASP modelof the translation server, according to a preferred embodiment of thepresent invention.

FIG. 8 is a block diagram depicting the serving process in a web servicemodel of the translation server, according to a preferred embodiment ofthe present invention.

FIG. 9 is a screenshot of a WebCATT interface used for viewing atranslatable component, in one embodiment of the present invention.

FIG. 10 is a screenshot of a WebCATT interface used for viewing atranslatable component along with a corresponding translation, in oneembodiment of the present invention.

FIG. 11 is a screenshot of a WebCATT interface used for editing atranslatable component, in one embodiment of the present invention.

FIG. 12 is a screenshot of a WebCATT interface used for viewing atranslation queue, in one embodiment of the present invention.

FIG. 13 is an operational flow diagram depicting the process of WebCATT,according to a preferred embodiment of the present invention.

FIG. 14 is an operational flow diagram depicting the process of thespider, according to a preferred embodiment of the present invention.

FIG. 15 is an operational flow diagram depicting the synchronizationprocess according to a preferred embodiment of the present invention.

FIG. 16 is a block diagram showing a computer system useful forimplementing the present invention.

DETAILED DESCRIPTION

The present invention, according to a preferred embodiment, overcomesproblems with the prior art by providing an efficient andeasy-to-implement system and method for dynamic language translation ofa web site.

OVERVIEW

FIG. 3 is a block diagram illustrating the system architecture of a website presented in two languages, in one embodiment of the presentinvention. The web site of FIG. 3 is presented in a first language, suchas English, and a second language, such as Spanish. FIG. 3 shows the webserver 112 of FIG. 1 connected to the Internet 116 via a web connection.Also as shown in FIG. 1, a public user 118 accesses the web server 112via the Internet 116 and download information, such as a web page, fromthe web server 112 for viewing. The user 118 utilizes a clientapplication, such as a web browser, on his client computer to connect tothe web site of via the network 116. Once connected to the web site, theuser 118 browses through the products or services offered by the website by navigating through its web pages.

The web server 112 is operated by programming logic 110 and the webserver 112 further has access to a database 102 of information, as wellas HTML template files 104, graphics files 106 and multimedia files 108,all of which constitute the English components of the web site served byweb server 112.

FIG. 3 further shows translation server 300 situated apart from andexisting independently from the web server 112. The translation server300 embodies the main functions of the present invention, including theprovision of a web site in a secondary language, such as Spanish. Thetranslation server 300 provides the secondary language components of abase web site, which is provided by web server 112, without requiringintegration with the base web site or re-configuring or re-engineeringof the web server 112.

As can be seen in the difference between FIG. 2 and FIG. 3, thedeployment of the secondary language components FIG. 3 requires asignificantly reduced expenditure of time and resources than thedeployment of FIG. 2. Further, the deployment of FIG. 3 does not requirethe re-engineering of the web server 112. Additionally, once thesecondary language components have been established by the translationserver 300, they are automatically kept synchronized with the Englishlanguage components of the base web site. Thus, the system of thepresent invention is advantageous as it reduces the amount of time,effort and resources that are required to deploy a secondary languageweb site.

FIG. 4 is a block diagram illustrating the system architecture of thepresent invention, in one embodiment of the present invention. FIG. 4presents an alternative point of view of the system architecture of thepresent invention. FIG. 4 shows a web site 414 representing a web sitein a first language such as English that is connected to the Internet412 via a web connection. FIG. 4 further shows a user 416 that utilizesa web connection to the Internet 412 to browse and navigate the webpages served by the web site 414.

FIG. 4 further shows the translation server 400, corresponding to thetranslation server 300 of FIG. 3, and a translation database 406 for useby the translation server 400 in storing of translatable componentsduring the serving of web pages in a secondary language such as Spanish.This process is described in greater detail below. Also shown in FIG. 4is the Web Computer Aided Translation Tool (WebCATT), which is a toolfor aiding a human 418 or an admin 410 in translating the components ofa web site in a first language. Further shown is a spider 404 for use inanalyzing and sizing a web site 414. The translation server 400, andWebCATT tool 408 are connected to a web server 402, which is the conduitthrough which all web actions of the above tools are channeled. Thetranslation server 400, WebCATT tool 408 are described in greater detailbelow.

In an embodiment of the present invention, the computer systems oftranslation server 400, WebCATT tool 408, spider 404 and web server 402are one or more Personal Computers (PCs) (e.g., IBM or compatible PCworkstations running the Microsoft Windows 95/98/2000/ME/CE/NT/XPoperating system, Unix, Linux, Macintosh computers running the Mac OSoperating system, or equivalent), Personal Digital Assistants (PDAs),game consoles or any other information processing devices. In anotherembodiment of the present invention, the computer systems of translationserver 400, WebCATT tool 408, spider 404 and web server 402 are serversystems (e.g., SUN Ultra workstations running the SunOS operating systemor IBM RS/6000 workstations and servers running the AIX operatingsystem).

In one embodiment of the present invention, Internet network 412 is acircuit switched network, such as the Public Service Telephone Network(PSTN). In another embodiment of the present invention, the network 412is a packet switched network. The packet switched network is a wide areanetwork (WAN), such as the global Internet, a private WAN, a local areanetwork (LAN), a telecommunications network or any combination of theabove-mentioned networks. In another embodiment of the presentinvention, network 412 is a wired network, a wireless network, abroadcast network or a point-to-point network.

Translation Server

INTRODUCTION

The translation server 400 is the back-end application responsible forthe conversion of web pages to another language. The translation server400 parses each incoming HTML page into translatable components,substitutes each incoming translatable component with an appropriatetranslated component, and returns the translated web page back to theonline user 416. Page conversion is performed on the fly each time anonline user 416 requests a page in the second or alternate language.When a web page is received for conversion, the translation server 400will translate the page if enough translated content is available tomeet a customer specified translation threshold. If this is not thecase, then the page will be returned in the first or original language.

A translatable component includes any one of a text segment, an imagefile with text to be translated, a multimedia file with text or audio tobe translated, a file with text to be translated, a file with image withtext to be translated, a file with audio to be translated and a filewith video with at least one of text and audio to be translated.

The page conversion process follows seven major steps. In a first step,for each text segment encountered, if a translation is available itreplaces it with the translated text segment. If no translation isavailable, either the text remains in the original language or a machinetranslation is performed on the fly, depending on the customer'spreference. In a second step, for each linked file (images, PDF files,Flash movies, etc.) encountered, if a translated file is available theHTML link tag is rewritten so that it points to the translated file. Ifa translated file is not available, the original link tag is leftuntouched. In a third step, any relative Universal Resource Locator(URL) found in the page is converted to an absolute URL. This isnecessary because the browser resolves relative URLs based on the URL ofthe current page. In the case of a translated page, the URL of the pageis actually in the translation server 400. As a result, the browserwould request all files and links with relative URLs from thetranslation server 400, which is not the correct original location.

In a fourth step, each JavaScript block is parsed for directive tagsthat indicate text content to translate. Images are automaticallydetected by recognition of the file extension. Script tags thatreference external JavaScript files are rewritten so that they areredirected to the translation server 400. They are then parsed andtranslated in a separate browser Hyper Text Transfer Protocol (HTTP)request. In a fifth step, each link to another web page is rewritten sothat the original URL is redirected to the translation server 400. Whenan online user clicks on a rewritten link, the request then goesdirectly to the translation server 400 and the page is in turntranslated. Links to other web pages placed in JavaScript blocks areautomatically recognized, either by extension or by pre-defined customerspecific URL patterns, and also rewritten for redirection. This feature,which keeps the user in the alternate language as they browse the site,is called “implicit navigation”.

In a sixth step, for each directive tag or attribute found, theappropriate instruction is performed. In a seventh step, the translationserver 400 automatically schedules the web page for translation byplacing it in the WebCATT 408 translation queue, in the event atranslation cannot be found for one or more text segments or linkedfiles in the page.

FIG. 5 is an operational flow diagram depicting the process of thetranslation server 400, according to a preferred embodiment of thepresent invention. The operational flow diagram of FIG. 5 depicts theprocess of the translation server 400 of responding to a user requestfor a web page in a secondary language. The operational flow diagram ofFIG. 5 begins with step 502 and flows directly to step 504.

In step 504, the translation server 400 receives a request from a user416 on a web site 414, the web site 414 having a first web content in afirst language such as English. The request, such as an HTTP request ora Simple Mail Transfer Protocol (SMTP) request, calls for a second webcontent in a second language such as Spanish. The second web content isa human or machine translation in a second language of the first webcontent. The first language includes any one of English, French,Spanish, German, Portuguese, Italian, Japanese, Chinese, Korean, andArabic and the second language is different than the first language andincludes any one of English, French, Spanish, German, Portuguese,Italian, Japanese, Chinese, Korean, and Arabic.

In step 506, the translation server 400 retrieves the first web contentfrom the web site 414. In step 508, the translation server 400 dividesthe first web content into a plurality of translatable components. Instep 510, the translation server 400 generates a unique identifier foreach of the plurality of translatable components of the first webcontent. For a text segment, the translation server 400 can generate aunique identifier using a hash code, a checksum or a mathematicalalgorithm.

In step 512, the translation server 400 identifies a plurality oftranslated components of the second web content using the uniqueidentifier of each of the plurality of translatable components of thefirst web content. In step 514, the translation server 400 arranges orputs the plurality of translated components of the second web content topreserve a format that corresponds to the first web content. Thetranslation server 400 can arrange or put the plurality of translatedcomponents of the second web content to preserve a format thatcorresponds to the first web content, including putting formatting tagsthat are not visible in the first web content. In step 516, thetranslation server 400 provides the second web content in response tothe request that was received. In step 518, the control flow of FIG. 5stops.

FIGS. 6A-6C illustrate an operational flow diagram depicting the servingprocess of the translation server 400, according to a preferredembodiment of the present invention. The operational flow diagram ofFIGS. 6A-6C depicts the process of the translation server 400 ofproviding a web page in a secondary language in response to a userrequest. Specifically, the operational flow diagram of FIGS. 6A-6Cprovides more detail with regards to steps 508-514 of FIG. 5 above. Theoperational flow diagram of FIGS. 6A-6C begins with step 601 and flowsdirectly to step 602.

Step 601 begins with a source HTML page or first web content of step 506of FIG. 5. In step 602, at least one portion of the first web content isparsed into a translatable component. In step 603, it is determinedwhether the end of the file of the first web content is reached. If theresult of the determination is affirmative, then control flows to step612. Otherwise, control flows to step 604. In step 604, it is determinedwhether the translatable component that was parsed in step 602 is a textsegment. If the result of the determination is affirmative, then controlflows to step 605. Otherwise, control flows to step 614.

In step 605, a hash code or other unique identifier is computed for thetext segment. In step 606, using the unique identifier, a matchingtranslated text segment is looked up in a cache. In step 607, it isdetermined whether the matching translated text segment is found in thecache. If the result of the determination is affirmative, then controlflows to step 608. Otherwise, control flows to step 618. In step 608, itis determined whether there was multiple matching translated textsegments found in the cache. If the result of the determination isaffirmative, then control flows to step 620. Otherwise, control flows tostep 609. In step 620, the correct translated segment is determinedusing the sequence constraints and a character by character comparison.In step 609, it is determined whether translation of the text segment issuppressed or not yet translated. If the result of the determination isaffirmative, then control flows to step 621. Otherwise, control flows tostep 610.

In step 610, the matching translated text segment is set as a targetsegment. In step 621, the current text segment is set as the targetsegment. In step 640, the target segment is added to the output webcontent, or second web content (i.e., the translated HTML page or theoutput HTML page). In step 623, the second web content is output forprovision to the user requesting the web page.

In step 612, it is determined whether there is an incomplete translationof the current web page, i.e., the first web content. If the result ofthe determination is affirmative, then control flows to step 613.Otherwise, control flows to step 611. In step 613, the current web pageis scheduled for translation. In step 611, the translation activityperformed by the translation server 400 in servicing the current webpage is recorded in the translation database 406. In step 625, it isdetermined whether the percentage of the current web page, i.e., thefirst web content, is translated is above a threshold. If the result ofthe determination is affirmative, then control flows to step 624.Otherwise, control flows to step 626. In step 624, the second webcontent or translated HTML page is output for provision to the userrequesting the web page. In step 626, the current web page or first webcontent is output unchanged for provision to the user requesting the webpage.

In step 614, it is determined whether the translatable component parsedin step 602 is a translatable file such as a PDF file, an image file,etc. If the result of the determination is affirmative, then controlflows to step 615. Otherwise, control flows to step 629. In step 629, itis determined whether the translatable component parsed in step 602 is alink to another translatable page. If the result of the determination isaffirmative, then control flows to step 628. Otherwise, control flows tostep 627. In step 627, a tag is added to the translated HTML page toindicate a link (this is described in greater detail below). In step628, the link is modified to redirect the URL (this is described ingreater detail below).

In step 615, a translated file corresponding to the translatable file islooked up in a cache. In step 616, it is determined whether thetranslated file was found. If the result of the determination isaffirmative, then control flows to step 617. Otherwise, control flows tostep 633. In step 633, the translated file is looked up in thetranslation database 406. In step 635, it is determined whether thetranslated file was found. If the result of the determination isaffirmative, then control flows to step 634. Otherwise, control flows tostep 632. In step 634, the translated file that was found is stored inthe cache. In step 632, an incomplete translation is recorded in thetranslation database 406. In step 630, the original web page is set asthe target file. In step 631, the target file is added to the translatedHTML page.

In step 617, it is determined whether translation is suppressed for thetranslatable file. If the result of the determination is affirmative,then control flows to step 630. Otherwise, control flows to step 636. Instep 636, the translated file is set as the target file. In step 618,the using the unique identifier, a matching translated text segment islooked up in the translation database 406. In step 622, it is determinedwhether the matching translated text segment is found in the database.If the result of the determination is affirmative, then control flows tostep 619. Otherwise, control flows to step 637. In step 619, thetranslated segment that was found is stored in the cache. In step 637,an incomplete translation is recorded in the translation database 406.

In step 638, it is determined whether a machine translation of the textsegment can be performed. If the result of the determination isaffirmative, then control flows to step 639. Otherwise, control flows tostep 621. In step 639, the machine translation is set as the targetsegment.

ASP Model

The translation server 400 can be presented in a variety of models. Inthe Application Service Provider (ASP) model, the translation server 400converts full web pages or script files at a time and delivers themdirectly to the online user 416. Under this model, the links in a webpage are rewritten so that the request is redirected to the translationserver 400. For example, the URL of the translation server 400 for afictional customer called ABC Widgets is defined as:http://trans1.motionpoint.net/abcwidgets/enes/

Then the link <a href=“http://www.abcwidgets.com> would be rewritten asfollows: <a href=“http://trans1.motionpoint.net/abcwidgets/enes/?24;http://www.abcwidgets.com”>

Clicking on the above rewritten link results in the browser requestbeing sent to the translation server 400. The translation server 400 inturns reads the original URL passed in the query string (i.e.,everything after the question mark), requests the page from the ABCWidgets server, converts it to the alternate language, and sends it backto the user 416.

FIG. 7 is a block diagram depicting the serving process in an ASP modelof the translation server 400, according to a preferred embodiment ofthe present invention. In a first step 702, the user 416 clicks on alink of a web page in a first language on the web site 414. The linkpoints to a page to be translated. The translation server 400 receivesthe request and processes it. In a second step 704, the translationserver 400 forwards the request to the web site 414 and in a third step706, the web site 414 provides the page to the translation server 400for translation. In a fourth step 708, the translation server 400translates the page using the translations in the translation database406 and sends the translated page to the user 416.

Web Service Model

In the web service model, the translated content is not delivereddirectly to the online user 416. Instead the customer's web site server414 issues the request for translation to the translation server 400,which acts as a web translation service. Under this model, thetranslation server 400 can convert full pages or just specific textsegments and/or files. When directly translating text segments or files,multiple translation requests can be issued, one per segment or file, ormultiple segments and files can be translated in a single hatchedrequest.

FIG. 8 is a block diagram depicting the serving process in a web servicemodel of the translation server 400, according to a preferred embodimentof the present invention. In a first step 802, the user 416 clicks on alink of a web page in a first language on the web site 414. The linkpoints to a page to be translated. The web site server 414 receives therequest and processes it. In a second step 804, the web site 414provides the page to the translation server 400 for translation. In athird step 806, the translation server 400 provides the translated pageto the web site 414. In a fourth step 808, the web site 414 sends thetranslated page to the user 416.

Hosting and Management

The hosting and management model defines who deploys and manages thehardware and operating system software in which the software componentsof the present invention reside. There are two hosting and managementmodels: hosted & managed, and managed only. Alternately, the softwarecan be licensed directly to the customer and the customer is responsiblefor both the hosting and management.

The hosted and managed model is a fully outsourced model in which oneentity hosts the service and all translated data. Under this model, oneentity deploys the translation server 400 and WebCATT 408 software onits own hardware. All hardware and software is provisioned andmaintained by this entity, so the customer web site 414 has noresponsibility for any hardware or software related to the service. Inthis model, the hosting entity is responsible for: 1) provisioning,installing, configuring and maintaining all hardware, includingcommunication to the Internet 412, 2) installing, configuring andmaintaining all operating system, web server and database serversoftware, 3) installing, configuring and managing on an ongoing basisthe translation server 400 and WebCATT 408 software and 4) maintainingstaff and subcontractors that use the WebCATT 408 software to performthe translations that maintain the alternate language site in sync withthe original language site.

In the managed only model, the translation server 400 and WebCATT 408software are installed on the customer web site's hardware. In thismodel the customer web site 414 is responsible for: 1) provisioning,installing, configuring and maintaining all hardware, includingcommunication to the Internet 412, 2) installing, configuring andmaintaining all operating system, web server and database serversoftware. The managing entity responsible for: 1) installing,configuring and managing on an ongoing basis the translation server 400and WebCATT 408 software, 2) maintaining staff and subcontractors thatuse the WebCATT 408 software to perform the translations that maintainthe alternate language site in sync with the original language site.

Dedicated vs. Shared Servers

The components of the present invention can be deployed in dedicated orshared server environments. In a shared environment multiple customerweb sites share the same hardware. In a typical scenario, multipletranslation servers 400 are installed in the same web server 402, whichconnects to a database server containing the database 406 of translateddata. A single WebCATT 408 software installation may is also shared bymultiple customers. This setup is cost efficient and can be used forsmall and medium size sites with low-to-moderate web site traffic.

In a dedicated environment all hardware is dedicated to one customer website 414. This is necessary for large organizations with heavy web sitetraffic and large amounts of text to be translated. In this case, eithera single web server 402 or a cluster of web servers is dedicated to thecustomer. The database server is also normally dedicated to thecustomer. Dedicated servers assure guaranteed bandwidth for the customerand simplify keeping track of bandwidth usage for management and billingpurposes.

Parsing & Translation

The system of the present invention does not save or maintain translatedpages. Although, this may be useful for sites with static content, itbecomes unmanageable for sites whose content is generated dynamicallyfrom database information in response to a user's request. Instead, thepresent invention stores only those components within a web page thatrequire translation, i.e., translatable components.

Parsing is the process of breaking-up an HTML page submitted fortranslation into its translatable and non-translatable components.Non-translatable components simply pass through the system unchanged(except for URLs that need rewriting). Translatable components areprocessed and replaced by their translated counterparts if available.There are generally two types of translatable components in a web page:text segments and files. A translatable component includes any one of atext segment, an image file with text to be translated, a multimediafile with text or audio to be translated, a file with text to betranslated, a file with image with to be translated, a file with audioto be translated and a file with video with at least one of text andaudio to be translated.

A text segment is a chunk of text on the page as defined by the HTMLthat surrounds it. A text segment can range from a single word to aparagraph or multiple paragraphs. A file is any type of external contentthat resides on a file, is linked from within the page, and may requiretranslation. Typical types of linked files found in web pages areimages, PDF files, MS Word documents and Flash movies.

Below is an example of a very simple HTML page:

<html><head><title>Widget Product Information</title></head><body>Widget <b>Model#123</b><p> This widget is very useful for many chores around the house.<p><img src=“img/widgetpicture.gif” alt=“Product photo”><p><a href=“http://www.abcwidgets.com”>Click here to return to the homepage</a></body></html>

The above example page would by default be parsed into the following sixtext segments: 1) ‘Widget Product Information’, 2) ‘Widget’, 3)‘Model#123’, 4) ‘This widget is very useful for many chores around thehouse.’, 5) ‘Product photo’, 6) ‘Click here to return to the home page’.The above example page would further be parsed into the following onefile: img/widgetpicture.gif.

By default the parsing system breaks-up text segments according to theHTML tags in the page. In the above example, the sentence ‘WidgetModel#123’ was broken up into two segments because there was an HTMLbold tag (<b>) in the middle of it. However, the parsing system isflexible and allows defining, on per-customer basis, which HTML tags areformatting tags that should not break up text segments. So if we definethe bold tag as a formatting tag, then the example page would instead beparsed into the following five text segments: 1) ‘Widget ProductInformation’, 2) ‘Widget <b>Model#123</b>’, 3) ‘This widget is veryuseful for many chores around the house.’, 4) ‘Product photo’, 5) ‘Clickhere to return to the home page’.

The bold tags now became part of the second text segment, allowing thetranslator to place them in the correct location in the alternatelanguage. For example, translating the text segment ‘Widget<b>Model#123</b>’ to Spanish will result in flipping the order of the‘Widget’ and ‘Model’ words within the sentence. Since the bold tag ispart of the text segment it can be moved so it still holds the word‘Model’, as shown: <b>Modelo No. 123</b> de Artefacto

Below is an example of how the example page is converted to Spanish bythe translation server 400:

<html><head><title>Informacion del Artefacto</title></head><body><b>Modelo No. 123</b>del Artefacto<p>Este artefacto es muy util para todo tipo de trabajos en la casa.<p><img src=“http://www.trans1.motionpoint.netlimg/abcwidgets/ES_24.gifalt=“Foto del Producto”><p><ahref=““http://trans1.motionpoint.netlabcwidgets/enes/”?24;http://www.abcwidgets.com”>Hagaclic aqui para regresar ala pagina principal</a></body><lhtml>

In order to convert the page, the translation server 400 performedseveral changes to the page. Each text segment was replaced with acorresponding translation. It is important to note that the text of theimage description (‘Product photo’) placed in the ‘alt’ attribute of theimage tag was recognized as a text segment and translated. Thetranslation server 400 can recognize text segments inside attributes ofHTML tags, such as the text in buttons of a form.

Further, the URL of the image tag was replaced to point to a translatedimage file. The translation server 400 only executes this action if atranslated file has been defined (since many images do not have text andthus do not require translation), otherwise it does not change the URLof the image (except to make the URL absolute if it is not). In thisexample it is assumed that the ‘ES_24.gif’ image file was defined inWebCATT 408 as the translation for the ‘widgetpicture.gif’ file.

The URL of the home page link was rewritten from‘http://www.abcwidgets.com’ to‘http://trans1.motionpoint.netlabcwidgets/enes/?24;http://www.abcwidgets.com’ in order to redirect it to the translationserver 400. This is done so when the online user clicks on the ‘Clickhere to return to the home page’ link, the request will go directly tothe translation server 400 and the home page will also be translated.This process is called implicit navigation and it is explained in moredetail below.

Implicit Navigation

Implicit navigation is a translation server 400 feature that keeps anonline user 416 in the alternate language as he/she browses a web site.Implicit navigation is implemented by rewriting the URLs in theapplicable links inside a page as the page is being translated, so theyare redirected to the translation server 400. As a result, not only isthe page translated, but also all applicable links to other translatedpages within the page are modified so that when the consumer clicks onthe linked page it will also be automatically translated.

To rewrite a link, the translation server 400 prefixes the original URLwith the URL of the translation server 400, so the original URL becomesthe query string to the translation server 400 URL. When a rewrittenlink is clicked, the request goes to the translation server 400, whichreads the query string to obtain the original URL to be translated andrequests the page to be translated from this URL. The translation server400 then converts the page received to the alternate language anddelivers the translated page to the consumer directly.

When a link is rewritten, the original URL is only one part of the querystring. The other part of the query string is a special numeric actionID, which provides information about the type of conversion requestbeing performed.

The following describes some supported base action IDs. “1” indicates noaction. “2” indicates pages that were not translated, or for which thetranslation did not meet the minimum translation percentage, andtherefore should not be returned. “4” indicates HTML to be translated isbeing submitted as POST data when processing a POST request. If thisaction is not specified, then the URL passed in the query string isaccessed in order to obtain the HTML to be translated. “8” indicatesthat all relative URLs in the HTML should be converted to absolute URLs.This is necessary only in GET requests. If relative URLs are not used inthe document, this action should not be specified. “16” indicatesimplicit navigation is enabled. “32” indicates the request includescookie data to be passed back as cookies to all URLs to be translated.

“64” indicates that all links in the page are to be disabled. Thisoverrides action ID “16” if also specified. “128” indicates translationof the page is to be disabled. This is used to process tags withoutaffecting content. “8192” indicates a translation is being requestedfrom WebCATT 408 for previewing. The translation server 400 adds specialHTML tags to the web page to allow highlighting translated as opposed tonot-translated segments, disabling links to other pages, addingalternate language hover preview features, and allowing editing asegment or file by clicking on it in the preview page.

Actions may be combined by using the sum of the IDs as the action ID.For example, the following illustrates how implicit navigation isperformed on a link of a fictional online retailer ABC Widgets: <ahref=“http://www.abcwidgets.com/product_listing.jsp? category=TV”>See TVProducts Listing</a>

In order to translate the listing page to Spanish, the link is rewrittenas follows: <a href=“http://trans1.motionpoint.netlabcwidgets/enes/?24;http://www.abcwidgets.com/product_listing.jsp?category=TV”>See TVProducts Listing</a>

In the above example the original URL is:http://www.abcwidgets.com/product_listing.jsp? category=TV

The translation server 400 URL is:http://trans1.motionpoint.net/abcwidgets/enes/

And the action ID is “24”, which means to enable implicit navigation andto convert relative URLs to absolute.

The scope of implicit navigation can be pre-defined by domain and/or URLpatterns. In a typical scenario, only pages being served from a specificdomain(s) should be translated. In the ABC Widgets example, if theimplicit navigation domains are defined as abcwidgets.com andabcwidgets.net, then only URLs within those two domains will berewritten. If a more granular translation is required, such as whentranslating only a part of a web site, then URL patterns can be used.For example, if ABC Widgets wishes not to translate the careers andinvestor relations sections of their site, then the following twoexample Exclude URL patterns could be used: 1)abcwidgets.com/careers/and 2) abcwidgets.com/investor/

Any URLs for pages residing within the above two paths would not berewritten and thus never translated. On the other hand, if ABC Widgetswishes only to translate its online product catalog, then the followingexample Include URL pattern could be used: abcwidgets.com/catalog/

In that case, only pages residing within the abcwidgets.com/catalog/path are rewritten and thus translated. Include and Exclude URL patternsmay be combined to better define the scope of the translation. Implicitnavigation can also be controlled from within the HTML to be translatedthrough the use of directive tags or directive attributes. These areexplained in detail in below.

E-Commerce Database Language Enabling

The system of the present invention enables users to access the sameoriginal language e-commerce database in multiple languages. Since thetranslation server 400 processes web pages after they have left thecustomer web site 414, but before they reach the user 416, it does notaffect a web server's e-commerce technology. As a result, the same website 414 can be accessed in multiple languages, and all users areaccessing the same e-commerce database simultaneously.

For example, an auction web site can allow users in different countriesto bid on the same item. Each user can view the site and bid on the itemin his native language. Since all bids from the different countries areactually hitting the same web site and the same e-commerce engine, allbids occur in real time and each user can see in real-time what all theother users in all other countries are bidding.

Text Segment Matching

When looking up a suitable translation for a text segment in an HTMLpage, a character-by-character comparison of the text in the segmentagainst a database 406 of stored text segments is not ideal because itis very time consuming. As a result, in one embodiment of the presentinvention, the translation server 400 computes three 64-bit numeric hashcodes from each incoming text segment. The hash code function isoptimized to spread the resulting hash code across the full range of64-bit numeric values (−9223372036854775808 to 9223372036854775807).

The three hash codes are computed as follows: 1) hash code 1 is based onall characters in the segment, 2) hash code 2 is based on the oddcharacters in the segment and 3) hash code 3 is based on the evencharacters in the segment. By distributing the hash code computations inthis manner, the chances of key collisions are drastically reduced. Thethree computed hash codes make a composite key that represents each textsegment in the memory caches and in the database. In the unlikely eventthat multiple text segments are represented by the same composite key,the translation server 400 will then resort to a character-by-charactermatch.

Text Segment Locking

Occasionally, the meaning of a word or phrase may change depending onthe context in which it's being used. It is also possible that thetranslation itself may vary depending on the context or placement of atext segment, even if the original meaning does not change. As a result,it may be necessary to specify multiple translations for the same wordor phrase, one for each usage context. The text segment locking featureallows translators to do this by providing the ability to “lock”translated text segments together. When two or more translation textsegments are locked together they are used only when the exacttranslation sequence is followed.

For example, the translation to Spanish of the text segment “VirtualBrochures” can vary, depending on where it is used. Below is thissegment used in an English HTML sentence: <b>Virtual Brochures</b> aregreat. The corresponding translation to Spanish is: <b>Los FolletosVirtuales</b>son excelentes. Another example of a segment used in anEnglish HTML sentence: There are many great <b>Virtual Brochures</b>.The corresponding translation to Spanish is: Hay muchos excelentes<b>Folletos Virtuales</b>

For this example, we assume that the HTML bold (<b>) tag is not definedas a formatting tag and, therefore, forces each sentence above to bebroken up into two text segments each. As a result, the phrase “VirtualBrochures” becomes a separate text segment that requires a differenttranslation for each case. Using the text segment locking feature inWebCATT 408, the translator locks the “Los Folletos Virtuales”translated segment with the “son excelentes” translated segment in thefirst sentence, and the “Hay muchos excelentes” translated segment withthe “Folletos Virtuales” translated segment in the second sentence.

At conversion time, when the translation server 400 encounters the“Virtual Brochures” segment in the first sentence it looks up acorresponding translated segment and gets back two potential matches:“Los Folletos Virtuales” and “Folletos Virtuales”. It then proceeds tolook up a translated segment for the next segment “are great” and getsback “son excelentes”. Since “son excelentes” is locked to “Los FolletosVirtuales”, the translation server 400 is able to determine that “LosFolletos Virtuales” is the correct translation to the previous segment“Virtual Brochures”.

Form Posting

The translation server 400 transparently handles form submissions viaGET or POST methods. This means that all form data is forwarded to theoriginal URL that processes the form and that the response page isconverted to the alternate language. The first step in the form handlingis performed when an HTML page that has a form in it is being converted.

If the form is submitted via POST method, then the translation server400 simply rewrites the URL in the ACTION attribute of the <FORM> tag.This is done by prefixing the original URL with the URL of thetranslation server 400, so the original URL becomes the query string tothe translation server 400 URL, much like the implicit navigationfeature in standard links. The browser will perform the POST request tothe translation server 400, which will read the query string to obtainthe original URL where the form is to be submitted and perform the POSTto that URL, forwarding it all form data. The translation server 400then reads the response page, converts it to the alternate language, anddelivers the translated page to the user directly.

If the form is submitted via the GET method, then the translation server400 cannot simply rewrite the URL in the ACTION attribute of the <FORM>tag because in a GET method the form data is sent in the query string.As a result, the browser would replace the original URL with the formdata and the translation server 400 would not know to what URL to submitthe form data. To overcome this limitation, the translation server 400adds a hidden field to the form whose value contains the original URL,and replaces the URL in the ACTION attribute of the <FORM> tag so therequest is sent to the translation server 400. The browser will performthe GET submission to the translation server 400, which will read thevalue of the hidden form field to obtain the original URL where the formis to be submitted and perform the GET submission to that URL,forwarding it all form data. The translation server 400 then reads theresponse page, converts it to the alternate language, and delivers thetranslated page to the consumer directly.

JavaScript/VBScript Handling

The translation server 400 is capable of translating text segments andfiles located inside JavaScript or VBScript code. Common types of filescan be recognized automatically by their standard extensions. Thetranslation server 400 parses all JavaScript code blocks and replacesthe URLs of all files for which a translation exists so it points to thetranslated file. Non-standard file extensions and URL patterns may bedefined on a per-customer basis to allow the translation server 400 torecognize less common or proprietary file formats, or even dynamicallygenerated files. File recognition and translation can also be controlledfrom within the JavaScript code through the use of directive tags. Theseare explained in detail below. Text segments inside script code thatrequire translation must be explicitly identified by placing a set ofdirective tags around the text.

Translation of content inside JavaScript or VBScript include files isalso supported. A script include file is downloaded by the browser in aseparate HTTP request and included in the web page as if it had appearedwithin the page. Include files are handled in the same manner asimplicit navigation in standard links within the page. The URL of theinclude file is rewritten so the original include file is prefixed withthe URL of the translation server 400 and the original file URL becomesthe query string to the translation server 400 URL. The browser willthen request the include file from the translation server 400, whichwill read the query string to obtain the URL of the original includefile and request it from its location. The translation server 400 thenreads the file, performs the appropriate conversions, and delivers themodified file to the browser for inclusion in the web page.

JavaScript include files are specified using the source (src) attributein the <SCRIPT> tag, as shown: <scriptlanguage=“javascript”src=“menu.js”></script>

Shown is an example of how the above script tag is rewritten so thecontent inside the JavaScript include file is translated: <scriptlanguage=“javascript”src=“http://trans1.motionpoint.netlabcwidgets/enes/?24;http://www.abcwidgets.com/menu.js”></script>

Directive Tags and Attributes

Directive tags and directive attributes are special HTML tags andattributes that allow more granular control over the translation andimplicit navigation within in a web page. Directive tags are specialHTML comments tags that are ignored by the browser, but provide specificinstructions to the translation server 400. Directive attributes arespecially named attributes placed within an HTML tag that are alsoignored by the browser, but provide specific instructions to thetranslation server 400 that apply only to the tag in which the attributeis placed.

Translation control tags and attributes are used to specify sections ona web page that should not get translated. One important use oftranslation control tags is to delimit personal information, such as apersons name, address, credit card numbers, etc. that may show up in aweb page, but which should not be processed—it simply passes through thetranslation server 400 without being translated or stored-for securityand privacy issues.

Following is an exemplary list of directive tags. The directive tag“mp_trans_partial_start & mp_trans_partial_end” signals the start andend of a partial translation section. This tag may be used at the top ofa web page in conjunction with section translate tags to selectivelytranslate sections of a page. The directive tag “mp_trans_enable_start &mp_trans_enable_end” signals the start and end of a section to betranslated within a partial translation section. All text and fileswithin this section are translated. The directive tag“mp_trans_disable_start & mp_trans_disable_end” signals the start andend of a section not to be translated when in normal translation mode.The directive tag “mp_trans_machine_start & mp_trans_machine_end”signals that any text segments enclosed within the tags may be machinetranslated in the event that a human translation is not available.

Following is a list of directive attributes. The directive attribute“mpdistrans” disables translation of a file or of translatable text in atag, such as alt, keywords or description meta-tag, or form buttons.

Below is an example of usage of translation control directive tags andattributes:

<html><head><meta name=“description” content=“This page description is translated”><meta mpdistrans name=“keywords” content=“These keywords are nottranslated, keyword1, keyword2, keyword3, keyword4, keyword5”><title> This title is translated</title></head><body>

This text and the image widget1.gifbelow are translated.

<img src=“img/widget1.gif alt=“This image description is translated”><p><img mpdistrans src=“img/widget2.gif alt=“This image and thisdescription are NOT translated because of the mpdistrans attribute”><!-- mp_trans_disable_start -->

This text and the image widget3.gifbelow are NOT translated because theyare inside a translation disabled section.

<img src=“img/widget3.gif.gif alt=“This image description is NOTtranslated”><!-- mp_trans_disable_end --> This text is translated.<!-- mptrans_partial_start --> This text is NOT translated because it isinside a partially translated section and not specifically designated astranslatable content.<!-- mp_trans_enable_start --> This text is translated because it isinside a partially translated section and it is specifically designatedas translatable content.<!-- mp_trans_enable_end --> This text is NOT translated because it isinside a partially translated section and not specifically designated astranslatable content.<!-- mp_trans_partial_end --> This text is translated.</body></html>

Following is a list of directive attributes for implicit navigationcontrol. The directive attribute “mpnav” enables implicit navigation forlisted attributes in the tag. This attribute can be used for tags thatdo not normally contain URLs, but do. The directive attribute “mpdisnav”disables implicit navigation for all attributes or only listedattributes of the tag. The directive attribute “mporgnav” forcesoriginal navigation for all attributes or only listed attributes of thetag. Original navigation will remove redirection to the translationserver if found, otherwise it will leave the link intact. This directiveattribute is discussed below with reference to one-link deployment.

Below is an example of usage of implicit navigation control directiveattributes.

<html><body>ABC Widgets Home Page<p><a href=“widgets.jsp”>See all useful widgets</a><p><a mpdisnav href=“uselesswidgets.jsp>See useless widgets</a><p><form action=“showwidget.jsp” method=“post”><select name=“WidgetSel”><option value SELECTED>Select a widget to view:</option><option mpnav=“value” value=“widget1.jsp”>Widget 1</option><option mpnav=“value” value=“widget2.jsp”>Widget 2</option></select></form></body></html>

The translation server 400 would process the above page as follows:

<html><body>Pagina Principal de ABC Widgets<a href=“http://trans1.motionpoint.netlabcwidgets/enes/?24;http://www 0 abcwidgets.com/widgets.jsp”>ver artefactos utiles</a><p><a mpdisnav href=“uselesswidgets.jsp>Ver artefactos inutiles </a><p><form action=“http://trans1.motionpoint.net/abcwidgets/enes/?24;http://www.abcwidgets.com/showwidget.jsp” method=“post”><select name=“WidgetSel”><option value SELECTED>Escoga un artefacto para verlo:</option><option mpnav=“value”value=“http://trans1.motionpoint.netlabcwidgets/enes/?24;http://www.abcwidgets.com/widget1.jsp”>Artefacto1</option><option mpnav=“value”value=“http://trans1.motionpoint.netlabcwidgets/enes/?24;http://www.abcwidgets.com/widget2.jsp”>Artefacto2</option></select></form></body></html>

It can be seen above that implicit navigation was not performed for theanchor (<A>) tag with the mpdisnav attribute. As a result, when the userclicks on the ‘Ver artefactos inutiles’ link, the uselesswidgets.jsp webpage is not redirected to the translation server 400 and therefore it isnot translated. Furthermore, the mpnav attribute placed in the two<OPTION> tags instructed the translation server 400 to perform implicitnavigation on the URL specified in the value attribute of each tag.

Following is a list of directive tags for JavaScriptNBScript control.The directive tag “mp_trans_textjs_start & mp_trans_textjs_end” signalsthe start and end of a section inside a script block that contains textto be translated. The directive tag “mp_trans_imgjs_start &mptrans_imgjs_end” signals the start and end of a section inside ascript block that contains images, PDF, Flash or other files to betranslated. Under most circumstances these tags are not needed as thetranslation server 400 JavaScript parser can automatically recognizecommon types of files by their standard extensions.

The directive tag “mp_trans_supressurljs_start &mp_trans_supressurljs_end” signals the start and end of a section insidea script block that inhibits the processing of URLs. URLs are processedfor implicit navigation, or to convert relative URLs to absolute URLs ifimplicit navigation is disabled. This tag may be necessary to avoidprocessing portions of URLs that are used to build up a final URL bymeans of concatenation.

Below is an example of usage of script control directive tags:

<script language=“Javascript”><!--function CheckLoginForm( ) {  <!--mp_trans_textjs_start --> var usermsg =“User name is required\n”; varpswdmsg =“Password is required\n”; var hdrmsg =“Please correct thefollowing errors:\n”; <!-- mp_trans_textjs_end --> var message=“”;if(document.LoginForm.login_user.value = “”) { message=message +usermsg;  } if(document.LoginForm.login_pass.value = “”) {message=message + pswdmsg;  } if(message = “”) {document.LoginForm.submit( );  } else { message= hdrmsg +message;alert(message);}}//--></script>

The above CheckLoginForm function verifies that an online user hasentered a login name and password before posting the LoginForm form inthe page. If a user has not entered the required information, then apop-up alert box shows an error message with details. The text of thevarious error messages is assigned to variables and enclosed in a set of‘mp_trans_textjs’ directive tags so it can be recognized and translated.

“One-Link” Deployment

One of the primary goals of the TransMotion system is to eliminate orminimize the workload of a customer web site's IT department in order todeploy an alternate language web site. The one-link deployment featureallows a customer to deploy the alternate language web site by simplyplacing one language-switching link in the home page of the originallanguage site.

The one-link deployment is a combination of two features: (1) automaticflipping of the language-switching link, and (2) implicit navigation tomaintain the user in the alternate language.

Automatic flipping of the language-switching link is specified by usingthe mporgnav directive attribute in the language-switching link. Themporgnav directive attribute instructs the translation server 400 torewrite the URL to support automatic language switching.

Below is an example of a very simple home page:

<html><body>Welcome to the ABC Widgets Home Page<p><a href=“widgets.jsp”>Click here to see all widgets we sell</a></body></html>

In order to deploy a mirror Spanish language web site all that has to bedone is place one link in the home page that redirects the home page toABC Widget's translation server 400. Below is an example of the abovehome page with the new language-switching link added:

<html><body>Welcome to the ABC Widgets Home Page<p><a mporgnav href=“http://trans1.motionpoint.net/abcwidgets/enes/?24;http://www.abcwidgets.com”>Click here to see this site in Spanish</a><p><a href=“widgets.jsp”>Click here to see all widgets we sell</a></body></html>

When a user clicks the ‘Click here to see this site in Spanish’language-switching link, the translation server 400 returns the homepage translated, as shown below:

<html><body>Bienvenidos ala Pagina Principal de ABC Widgets<p><a mporgnav href=“http://www.abcwidgets.com”>Haga clic aqui para vereste sitio web en Ingles</a><p><a href=“http://trans1.motionpoint.netlabcwidgets/enes/?24;http://www.abcwidgets.com/widgets.jsp”>Haga clic aqui para ver todos losartefactos que vendemos</a></body></html>

As shown above, in addition to translating the page, the translationserver 400 also rewrites the URL in the language-switching link andperforms implicit navigation of all other URLs in the page. Thetranslation server 400 rewrites the URL in the language-switching linkso that the translation server 400 redirection is removed. The mporgnavdirective attribute is used to instruct the translation server 400 to dothis. In addition, the link text ‘Click here to see this site inSpanish’ is translated as ‘Haga clic aqui para ver este sitio web enIngles’ (which means ‘Click here to see this site in English’). Thisautomatic and simultaneous change of both the URL and the text (orimage) in the language-switching link by the translation server 400 iswhat allows the user to flip back-and-forth between English and Spanish.

Implicit navigation is also performed in all the links on the page. Inthe above example home page, it was performed on the widgets.jsp page.As a result, when a user clicks on this rewritten link, the widgets.jsppage is in turn translated and implicit navigation performed on all ofits links within the abcwidgets.com domain. This process is repeated sothat the user is always navigating the site in the alternate language.

Customized Content

The translation server 400 allows delivering customized contentaccording to the language and/or locale that a user is viewing the sitein. When the translation server 400 requests a web page for translation,it sends two cookies to the original web server called ‘mptranslan’ and‘mptranscty’. The value of the ‘mptranslan’ cookie is a 2 or 3-letter(upper-case) language code in compliance with the ISO 639 standard. Thevalue of the ‘mptranscty’ cookie is a 2-letter (upper-case) country codein compliance with the ISO 3166 standard.

Web site server software can determine if a page is being viewed in analternate language and/or a different country by checking for thesecookies. For example, by checking that the ‘mptranslan’ cookie exists,and that its value is ‘ES’, a web server can determine that a page isbeing served in Spanish and customize the content being served, such asshowcasing items that appeal more to Hispanics. In addition, if acompany maintains operations in multiple countries, then it can use the‘mptranscty’ cookie to determine the country and show only products soldor shipped to that country.

Internal Search Engine Integration

When an online user 416 that is viewing a web site 414 in an alternatelanguage performs an internal site search, it is natural for the user toenter the search keyword(s) in the alternate language. When thetranslation server 400 forwards the search keyword(s) to the originalweb site, the search engine will not be able to find any matchingresults, or might deliver incorrect results. This occurs because the webserver search engine is matching the keyword(s) in the alternatelanguage against a search index of keywords that are in the originallanguage.

The translation server 400 provides an elegant solution to this problemby performing a real-time reverse machine translation on the searchkeyword(s) and forwarding the keyword(s) to the web server search enginein the original language. Reverse machine translation is configured soit is performed only on the specific keyword field(s) of the searchform(s) in a web site.

Internet Search Engine Compatibility

The system of the present invention is compatible with all Internetsearch engines, such as Google or AltaVista. These search enginesutilize content from both the body and head of the HTML document toindex a web page. To ensure transparent compatibility with Internetsearch engines, the system of the present invention translates allapplicable text in the head of the document. This includes the pagetitle, the page description meta-tag, and the keywords meta-tag.

Integration with Machine Translation

The translation server 400 can use real-time machine translation in theevent that a human translation is not yet available for a text segment.This an optional setting that can be specified per-customer, per-URLpattern and/or by means of directive tags.

Efficient Caching

Caching frequently used data in memory is necessary to minimize roundtrips to the database 406. There are two types of caches being used:dynamic and static. A dynamic cache is one whose entries are removedfrom the cache when memory becomes scarce, and use a Most-Recently-Used(MRU) algorithm to keep the most relevant entries in the cache. The useof an MRU algorithm to manage the cache guarantees that the mostfrequently accessed and most recently used entries are always in thecache. This type of cache is used for large, long-lived caches.

In a static cache, entries cannot be removed automatically when memorybecomes scarce. This type of cache is normally used for small,short-lived caches, but is also used for long-lived caches that will notgrow too large and whose entries must remain in the cache. Thetranslation server 400 contains five memory caches, which are describedin more detail below.

A main segment cache is a dynamic long-lived cache that stores ACTIVEtranslated text segments keyed by the composite key derived from theoriginal (not yet translated) text segment's 64-bit hash codes. Thisallows a quick lookup of translation text. Segments are removed fromthis cache if they are deactivated in the WebCATT 408. A translationqueue segment cache is a dynamic long-lived cache that stores the textsegments of all pages that are in the translation queue. This allows thetranslation server 400 to determine that a specific text segment thathas not yet been translated is already in the queue for translationwithout having to search the database. Segments are removed from thiscache when they are activated in the WebCATT 408.

A main file cache is a dynamic long-lived cache that stores ACTIVE fileskeyed by their names. This allows the quick lookup of a translated file.Files are removed from this cache if they are deactivated in the WebCATT408. A translation queue file cache is a dynamic long-lived cache thatstores the files of all pages that are in the translation queue. Thisallows the translation server 400 to determine that a file that has notyet been translated is already in the queue for translation withouthaving to search the database. Files are removed from this cache whenthey are activated in the WebCATT 408.

A translation queue page cache is a static long-lived cache that storesall pages that are in the translation queue. This allows the translationserver 400 to determine that a page that has not yet been translated isalready in the queue for translation without having to search thedatabase. A 64-bit hash code is used to determine if a page in the queuehas changed and has to be re-scheduled for translation. Pages areremoved from this cache when they are activated in the WebCATT 408.

The translation server 400 is advantageous as it does not require ITintegration with an existing web site infrastructure. The presentinvention converts the outbound HTML stream after it has left the clientweb server 414. Thus, there is no need to re-architect an existing website or build a separate web site for alternate language. Further, thereis no client storage or management of translated data required.Translated data is managed and maintained by the WebCATT 408 softwareoutside of the wed site's database.

The translation server 400 is further advantageous as it works with anyclient web server hardware and software technology infrastructure.Further, it allows for evolution of the existing client's hardware andsoftware technology infrastructure. Moreover, deployment of the presentinvention requires minimal effort as a reduced amount of client ITresources are required. The one-link deployment feature involves theclient placing one link on the web site 414 to provide access to thealternate language web site. Therefore, deployment is rapid and costeffective.

WebCATT

The WebCATT (Web Computer Aided Translation Tool) 408 is a web basedGraphical User Interface (GUI) application that is used to perform andmanage human translations. The tool is built specifically for web (HTML)page translations. It can be used by professional translators totranslate web site translatable components and by managers to manage thetranslation process. Since WebCATT 408 is a web-based application thatis accessed via the Internet 412, translators and managers can belocated in different geographical areas.

WebCATT 408 is similar to other computer aided translation tools used byprofessional translation service organizations. WebCATT 408 supportslocalization, text recognition, fuzzy matching, translation memory,internal repetitions, alignment, and a glossary/terminology database.WebCATT 408 is designed for web site translation and includes otherfeatures optimized for web translation, such as What You See Is What YouGet (WYSIWYG) HTML previewing and support for image/graphic translation.

WebCATT 408 organizes the translation workload into web pages. A webpage is the HTML content generated by a specific URL address, regardlessof whether that content is static (i.e., physically resides in the webserver in a file with a html extension), or dynamic (i.e., the contentis generated dynamically by combining information from a database andHTML templates). Dynamic pages that are dependent on session information(i.e., a shopping cart checkout page) are also supported.

Within a web page there are two types of units of translation thattranslators work with: text segments, and files. A text segment is achunk of text on the page as defined by the HTML that surrounds it. Atext segment can range from a single word to a paragraph or multipleparagraphs. A file is any type of external content that resides on afile, is linked from within the page, and may require translation.Typical types of files found in web pages are images, PDF files, MS Worddocuments and Flash movies. A file is translated by uploading areplacement file that has all text and/or sounds translated.

FIG. 9 is a screenshot of a WebCATT interface used for viewing atranslatable component, in one embodiment of the present invention. FIG.9 shows a display area 902 in which a web page including translatablecomponent in a first language (in this case, English) is displayed. Alsoshown in FIG. 9 is a section 904 including information associated withthe web page displayed in display area 902, such as page status, pageURL, page ID, etc. Further shown in FIG. 9 is a section 906 includingstatistics associated with the web site from which the displayed webpage is garnered, such as the number of files translated, the number ofsegments translated, the number of translations suppressed, etc.

FIG. 10 is a screenshot of a WebCATT interface used for viewing atranslatable component along with a corresponding translation, in oneembodiment of the present invention. FIG. 10 shows a display area 1002in which an original image file translatable component is displayed in afirst language (in this case, English). FIG. 10 shows a display area1004 in which a translated image file is displayed in a second language(in this case, Spanish). Also shown in FIG. 10 is a section 1006including information associated with the file displayed in displayareas 1002-1004, such as file status, file URL, file ID, etc. FIG. 10shows how WebCATT 408 allows a user to view a translatable componentalongside a corresponding translated component for comparison.

FIG. 11 is a screenshot of a WebCATT interface used for editing atranslatable component, in one embodiment of the present invention. FIG.11 shows a display area 1102 in which a web page including a translatedcomponent in a second language (in this case, Spanish) is displayed. Thedisplay area 1102 provides a WYSIWYG web page preview feature thatallows viewing the translated web page as it is being translated.Translations can often result in a significant amount of word growth(e.g., approx. 20% from English to Spanish) or shrinkage, which canresult in carefully formatted web page layouts being knocked out ofalignment by the longer text. The WYSIWYG page preview feature allowstranslators to immediately see the translated web pages and quickly makeadjustments in word choice in order to maintain the correct alignmentand layout of the page when translated.

Also shown in FIG. 11 is a section 1104 including information associatedwith the web page displayed in display area 1102, such as page status,page URL, page ID, etc. Further shown in FIG. 11 is a section 1106including statistics associated with the web site from which thedisplayed web page is garnered, such as the number of files translated,the number of segments translated, the number of translationssuppressed, etc. In addition to each of those statistics, a breakdown oftranslated and not translated components is shown in both units andpercentages.

A section 1110 provides a text segment edit form that allows atranslator to edit text segments in the order they appear on the page.This form features a fuzzy search feature that automatically shows andsorts existing segment matches in the database. The translator can copyan existing translation from the search results area to use as astarting translation.

A section 1108 provides a file list form that allows a translator topreview all linked files on the page. The list form allows thetranslator to select all files that do not require translation (e.g., animage with no text) and quickly tag them as such. It also allows atranslator to select individual files for translation via the file editform. File translation involves uploading a translated file andtranslating the file text description if present.

The GUI of FIG. 11 allows a user to view the plurality of translatedcomponents placed into the format derived from the first, or source,content, thereby enabling a user to review how the translated componentsare rendered in the first content format. The GUI of FIG. 11 furtherallows a user to highlight any of the plurality of translatablecomponents, which are not yet translated, differently from translatedcomponents when previewing the plurality of translated components in thefirst content format. The GUI of FIG. 11 further allows a user todisplay text when hovering over a translated component so as to view thefirst content corresponding to the translated component.

The GUI of FIG. 11 further allows a user to select at least one of thetranslated components when previewing the plurality of translatedcomponents in the first content format so as to edit the translatedcomponent and store the translated component that has been revised withthe corresponding unique identifier. Lastly, the GUI of FIG. 11 furtherallows previewing in a multi-user environment so that more than one usercan simultaneously view translated components rendered in the firstcontent format.

WebCATT 408 also provides complete management of the translationprocess. Web pages are scheduled for translation either automatically bythe translation server 400, or manually by a manager via upload of webpages or other type of content to be translated. When a web page isscheduled for translation it is placed in the translation queue of aspecific customer. Pages to be translated are scheduled for translationon a priority basis using algorithms based on the percentage of the pagealready translated and how often the page is being accessed on theoriginal web server while it's in the translation queue. This allows themost important pages (i.e., most frequently accessed and those withsmaller changes) to be translated first.

Once pages are in the queue, a manager can assign them for translationto a specific translator or translation service subcontractor. Ifassigned to a subcontractor, a subcontractor manager can then assignthem to specific translators within the subcontractor organization oreven to freelancers that work with them. Proofers can also be assigned.A subcontractor can assign its own proofers to pages and managers canalso assign proofers to check the work of translators or subcontractors.

A web page must go through a series of status changes before it isavailable via the Internet. A page can have any of the followingstatuses: NEW, IN-PRODUCTION, and ACTIVE. When a page is placed in thequeue its status is NEW. When a translator first accesses the page forthe purpose of translating it, its status is changed to IN-PRODUCTION.After the page is fully translated and proofed, then a manager changesits status to ACTIVE. Only ACTIVE pages available via the Internet.

In addition to the page statuses, the text and files within the pagemaintain their own translation status. The status for text segments andfiles is maintained both at the page level (i.e., one single overallstatus for all segments in the page and another one for all files in thepage) and individually. A text segment or file can have any of thefollowing statuses: NEW, TRANSLATED, CONTRACTOR_PROOFED, PROOFED andACTIVE. The initial status is NEW. After a translator translates thetext or file the status is changed to IN-PRODUCTION. When thetranslation is proofed by a subcontractor proofer the status is changedto CONTRACTOR_PROOFED and when it is proofed by an internal proofer thestatus is changed to PROOFED. Finally the manager changes the status toACTIVE. A page can only be activated after all segments and files withinit are ACTIVE.

FIG. 12 is a screenshot of a WebCATT interface used for viewing atranslation queue, in one embodiment of the present invention. FIG. 12shows a series of columns wherein a unit of information is provided foreach page of the web site 414 listed on each row. FIG. 12 shows a firstcolumn 1202 including unique page identifiers. Column 1204 includes aURL for each page. Column 1206 includes receipt data for each page.Column 1208 includes a percentage statistic indicating the percentage ofthe page that has been translated. Column 1210 indicates a status foreach page. Column 1212 indicates the contractor assigned to the page.

FIG. 13 is an operational flow diagram depicting the process of WebCATT408, according to a preferred embodiment of the present invention. Theoperational flow diagram of FIG. 13 depicts the process by which WebCATT408, which provides a web based tool for managing language translationsof content, queues and translates components of a web site 414. Theoperational flow diagram of FIG. 13 begins with step 1302 and flowsdirectly to step 1304.

In step 1304, WebCATT 408 retrieves a first content, or HTML sourcepage, in a first language from the web site 414. In step 1306, WebCATT408 parses the first content into a plurality of translatablecomponents. In step 1308, WebCATT 408 generates a unique identifier foreach of the plurality of translatable components of the first content.In step 1310, WebCATT 408 queues the plurality of translatablecomponents and corresponding unique identifiers for human or machinetranslation into a second language.

In step 1312, for each of the plurality of translatable components,WebCATT 408 stores a translated component and an associated uniqueidentifier corresponding to the translatable component, thereby storinga plurality of translated components and corresponding uniqueidentifiers.

In step 1314, WebCATT 408 provides the plurality of translatablecomponents and corresponding unique identifiers to a third party forhuman translation into a second language. In step 1316, the control flowof FIG. 13 stops.

WebCATT 408 is advantageous as it allows translators to work directlywith live pages off the web site 414 being translated. Thus, the clientweb site 414 need not send information to the translation server 400 fortranslation. Furthermore, all web pages in a web site are automaticallyentered into the translation work queue by the WebCATT 408 spider 404,described in greater detail below.

WebCATT 408 is further advantageous as WYSIWYG preview allowstranslators to see translated web pages, as they would appear on thelive web site. This allows the translator to compensate for word growthor shrinkage that knocks a web page layout out of alignment.

Furthermore, a translated preview page is marked-up with special HTML &JavaScript to allow: 1) color coding of all text in the web page so thetranslator can see what is already translated, what remains to betranslated and where the current text segment is located within thepage, 2) clicking in text or a file to take the translator to a form toedit the translation for the text or file and 3) hovering the mouse overa text or file to pop up a window showing the original wording or file.

WebCATT 408 is further advantageous as pages are parsed into itstranslatable components and translators only work with these components,not a complex group of HTML files. All HTML and script code is hiddenwhen using WebCATT 408. WebCATT 408 is further beneficial as it can beutilized via the ASP model and translators can access it via the web.Translated pages can be delivered via the translation server 400 orsaved as static html pages to be sent to client, wherein links amongpages are modified so they reference the translated pages.

WebCATT 408 is further beneficial as it allows management of thetranslation process. Multiple user access levels are supported:managers, proofers, translators & sub-contractors. Mangers can assignwork in the translation queue to translators, proofers and/orsubcontractors. Subcontractor managers can in turn sub-assign work tosubcontractor translators and proofers. Managers must activate web pagesbefore the translation server 400 can deliver them.

TransScope

A spider is a program that visits web sites and reads their pages andother information in order to create entries for an index such as asearch engine index. For example, the major search engines on theInternet all have such a program, which is also known as a “crawler” ora “bot.” Spiders are typically programmed to visit web sites that havebeen submitted by their owners as new or updated. Entire web sites orspecific pages can be selectively visited and indexed. Spiders arecalled spiders because they usually visit many web sites in parallel atthe same time, their “legs” spanning a large area of the “web.” Spiderscan crawl through a web site's pages in several ways.

One way a spider can crawl through a web site is to follow all thehypertext links in each page until all the pages have been read. Thespiders for the major search engines on the Internet adhere to the rulesof politeness for Web spiders that are specified in a standard for robotexclusion. This standard asks each server which files should be excludedfrom being indexed. It does not (or can not) go through a firewall. Thestandard also proscribes a special algorithm for waiting betweensuccessive server requests so that the spider doesn't affect web siteresponse time for other users.

The operations of a spider are in contrast with a normal web browseroperated by a human that doesn't automatically follow links other thaninline images and URL redirection. The algorithm used by spiders to pickwhich references to follow strongly depends on the spider's purpose.Index-building spiders usually retrieve a significant proportion of thereferences. The other extreme is spiders that try to validate thereferences in a set of documents. These spiders usually do not retrieveany of the links apart from redirections.

FIG. 4 shows a spider 404 for use in analyzing and sizing a web site414. The spider 404 is a tool that crawls specific web sites andperforms any of a variety of actions. The spider 404 can crawl a website in order to populate the WebCATT translation queue with new orupdated information. The spider 404 may also gather content statisticsthat can be used to provide a monetary quote for deployment of thepresent invention.

FIG. 14 is an operational flow diagram depicting the process of spider404, according to a preferred embodiment of the present invention. Theoperational flow diagram of FIG. 14 depicts the process by which spider404, which provides a web based tool for sizing a web site for languagetranslation, retrieves and indexes translatable components of a web site414. The operational flow diagram of FIG. 14 begins with step 1402 andflows directly to step 1404.

In step 1404, spider 404 retrieves a first content, or HTML source page,in a first language from the web site 414. The first content in a firstlanguage is for translation into a second content in a second language.The second web content is a human or machine translation in a secondlanguage of the first web content. In step 1406, spider 404 parses thefirst content into a plurality of translatable components. Atranslatable component includes any one of a text segment, an image filewith text to be translated, a multimedia file with text or audio to betranslated, a file with text to be translated, a file with image with tobe translated, a file with audio to be translated and a file with videowith at least one of text and audio to be translated.

In step 1408, spider 404 generates a unique identifier for each of theplurality of translatable components of the first content. For a textsegment, the translation server 400 can generate a unique identifierusing a hash code, a checksum or a mathematical algorithm. In step 1410,spider 404 stores the plurality of translatable components andcorresponding unique identifiers in the database 406 for human ormachine translation into the second language.

In optional step 1412, spider 404 queues the plurality of translatablecomponents and corresponding unique identifiers for human or machinetranslation into a second language. In optional step 1414, spider 404provides the plurality of translatable components and correspondingunique identifiers to WebCATT 408 for human translation into a secondlanguage. In step 1416, spider 404 generates statistics based on thetranslatable components retrieved from the web site 414. The statisticsgenerated include a file count, a page count, a translatable segmentcount, a unique text segment count, a unique text segment word count anda word count. The spider 404 can further generate a web page having alink to each file of the web site 414. In step 1418, the control flow ofFIG. 14 stops.

The spider 404 can be pre-configured for each customer web site so thatthe use of directive tags and/or attributes is eliminated or minimized.This minimizes the workload of the customer web site's IT personnel.Further, the spider 404 can be separately pre-defined by domain and/orby URL pattern. This allows specifying sections of a web site to betranslated without the need for placing directive tags in each web page.

The spider 404 is advantageous as it can be used to update the WebCATT408 translation work queue. Further, spider 404 can be used to gatherstatistics about a web site 414 in order to allow estimating the amountof work involved in translating the web site and pricing accordingly.

Spider 404 can summarize word counts, segment counts, file counts andpage counts of a web site 414. The spider 404 is further efficient andsupplements the functions of WebCATT 408 as it works to save all uniquetext segments and file URLs in the database 406 for later translationinto a second language. It can further create an HTML page containinglinks to all files of web site 414, so the files can reviewed fortranslation at a later time.

The spider 404 is efficient in navigating a crawling a web site 414 asit can emulate a browser by saving and returning cookies. Spider 404 canfurther fill out and submit forms with pre-defined information and isable to establish a session and normalize session ID parameters fore-commerce sites. Spider 404 can further be configured to crawl onlyspecific areas of a web site by defining include/exclude domains and URLpatterns. Spider 404 can also be configured to send specific HTTPheaders, such as the user-agent (i.e., type of browser). Spider 404 canbe executed in a single computer or in distributed mode. In distributedmode, multiple machines work in conjunction to crawl the same web sitesimultaneously sharing the same database 406.

TransSync

Most web sites are continuously updated with new information, butmaintaining an alternate language web site up to date presents achallenge when using traditional methods. The system of the presentinvention provides an elegant solution to this problem by providingvarious methods to maintain an alternate language web site up to date.

Automatic maintenance involves automated maintenance of the alternatelanguage web site so as to be maintained in synchronization with theoriginal site with no human intervention or little additional effort.Automatic maintenance is based on a function of the translation server400. Specifically, the function wherein the translation server 400automatically schedules a web page for translation by placing it in theWebCATT 408 translation queue (described in more detail above) in theevent a translation cannot be found for one or more text segments orlinked files in the page. Thus, the act of viewing a never-beforetranslated or a modified page in the alternate language enables thescheduling of the web page for translation.

There are several ways to take leverage the auto-scheduling function ofthe translation server 400. One way involves manual quality assurancereview. If a new web page or an updated web page goes through a manualquality assurance process that involves a person reviewing the pagebefore it is released to the live web site, then the quality assurancepersonnel simply attempts to view the page in the alternate languageduring the review process. This will place the new web page in theWebCATT 408 translation queue for translation before the page goes intothe production (live) web site. General Information and Policy type webpages are good candidates for this process.

Another way to take leverage the auto-scheduling function of thetranslation server 400 involves the spider agent 404. In the case of webpages that do not undergo an individual quality assurance review beforegoing into production, the spider agent 404 can be used to crawl a website, or just portions of a web site, in the alternate language on aregular basis. Crawling the web site in the alternate language isequivalent to a user viewing the site in the alternate language, andthus results in any new or modified pages being placed in the WebCATT408 translation queue.

This technique is ideal for regularly scheduled updates to a web site,which normally happens after hours. For example, if the ABC Widgets website modifies its sale offerings twice a week, such as on Mondays andFridays at 12 AM, then the spider agent 404 can be scheduled to crawlthe relevant parts of the site shortly after (e.g., at 12:30 AM) onthose days. Around-the-clock translators can then translate the new salebanners so that the alternate language web site is up to date sometimelater that morning.

The spider agent 404 can also be used to regularly (e.g., daily) crawl aweb site even when changes are not regularly scheduled. This willguarantee that the alternate language site is in sync with the originallanguage site after every crawl and subsequent translation.

Another way to take leverage the auto-scheduling function of thetranslation server 400 involves user access. Even if no manual qualityassurance reviews or scheduled spider agent 404 crawls are performed,the alternate language web site is still automatically maintained up todate over the long term. This is because the first online user thatattempts to view a new or modified page in the alternate language willtrigger the placement of that page into the WebCATT translation queue.In that case, the online user will see the page in the original languageor will see a partially translated page, depending on the amount of newcontent in the page and the pre-defined customer-specified translationthreshold. However, subsequent users that access the page will see theweb page in the alternate language after it has been translated.

In addition to automatic maintenance, the present invention alsosupports manual maintenance of the alternate language web site so as tobe maintained in synchronization with the original site. New informationthat needs translation can also be manually placed in the translationqueue using WebCATT 408. This can be useful to translate large amountsof data that is available in advance of it being on the live web site414. For example, if the ABC Widgets web site updates its web site withnew product offerings every Thursday morning and all product informationis available by the previous Tuesday, then all new product data can bemanually hatched into the translation queue using WebCATT 408 as soon asit is available so it is fully translated by the time the new web pagesgo live.

Population of the WebCATT 408 translation queue can be performed eitherby URL or by content. Population by URL means that translation server400 stores only the URL of the page in the queue. The content of the URLis retrieved afterwards when a translator accesses the page to translateit using WebCATT 408. Population by URL can present a problem if thecontent of the page is dependent on session information, such as asession ID present in a query parameter or stored in a cookie. In thatcase, the session ID in the query parameter may have expired or thesession information stored in the cookie will not be present whenviewing the page in WebCATT 408. This is usually the case in shoppingcart or account access pages.

Session dependent pages can be handled in two ways: (1) by replicatingthe session state via cookies and/or updated session parameters, or (2)by populating the page by content. Replicating the session state meansthat the translator must manually re-acquire a session from the originalsite and then enter the session data in WebCATT 408. Once the sessiondata is entered it can be used for translating multiple pages.Population by content means that translation server 400 stores the fullcontent of the page in the queue. This avoids the session dependenceissue, but can result in outdated content. As a result, population bycontent is only used for session dependent pages, and population by URL,which guarantees that the content being translated is the latestcontent, is used for all other pages.

Access to the WebCATT 408 translation queue is segmented by customer andprioritized. Pages to be translated are scheduled for translation on apriority basis using algorithms based on the percentage of the pagealready translated and how often the page is being accessed on theoriginal web server while the page is in the translation queue. Thisallows the most important pages (i.e., most frequently accessed andthose with smaller changes) to be translated first.

A file change detection feature can be used to deal with files whosenames have been changed. The translation server 400 and WebCATT 408 canmatch a file to be translated with its translated file by the URL of theoriginal file. However, it is possible for a file to be changed whileits name and location remain the same. In that case, it is possible thatan outdated translated file is used for the translation.

To overcome this issue, the translation server 400 computes a hash-codeor checksum based on the binary content of the file and stores it withthe URL. At conversion time, each time a file is presented fortranslation the translation server 400 re-computes the hash-code orchecksum and compares it against the stored one. If they match, the filehas not changed and the existing translated file can be used asreplacement. However, if they do not match, the binary content of thefile was changed and the existing file translation cannot be used. Inthat case, the page that contains the file is placed in the WebCATT 408translation queue so the file may be re-translated.

FIG. 15 is an operational flow diagram depicting the synchronizationprocess according to a preferred embodiment of the present invention.The operational flow diagram of FIG. 15 depicts the automatedmaintenance process of the alternate language web site so as to bemaintained in synchronization with the original web site 414. Theoperational flow diagram of FIG. 15 begins with step 1502 and flowsdirectly to step 1504.

In step 1504, a first content in a first language, or HTML source page,is retrieved from the web site 414. The first content in a firstlanguage is for translation into a second content in a second language.The second web content is a human or machine translation in a secondlanguage of the first web content. In step 1506, the first content isparsed into a plurality of translatable components.

In step 1508, a unique identifier is generated for each of the pluralityof translatable components of the first content. For a text segment, aunique identifier is generated using a hash code, a checksum or amathematical algorithm.

In step 1510, a plurality of translated components of the second webcontent are identified or matched using the unique identifier of each ofthe plurality of translatable components of the first web content. If atranslatable component of the first web content is not matched to atranslated component of the second web content, in step 1512, thetranslatable component is designated for translation into the secondlanguage. In optional step 1514, the plurality of translatablecomponents that weren't matched are queued for human or machinetranslation into a second language. In optional step 1516, the pluralityof translatable components that weren't matched are provided to WebCATT408 for translation into a second language. In step 1518, the controlflow of FIG. 15 stops.

Exemplary Implementations

The present invention can be realized in hardware, software, or acombination of hardware and software. A system according to a preferredembodiment of the present invention can be realized in a centralizedfashion in one computer system, or in a distributed fashion wheredifferent elements are spread across several interconnected computersystems. Any kind of computer system—or other apparatus adapted forcarrying out the methods described herein—is suited. A typicalcombination of hardware and software could be a general-purpose computersystem with a computer program that, when being loaded and executed,controls the computer system such that it carries out the methodsdescribed herein.

An embodiment of the present invention can also be embedded in acomputer program product, which comprises all the features enabling theimplementation of the methods described herein, and which—when loaded ina computer system—is able to carry out these methods. Computer programmeans or computer program as used in the present invention indicates anyexpression, in any language, code or notation, of a set of instructionsintended to cause a system having an information processing capabilityto perform a particular function either directly or after either or bothof the following a) conversion to another language, code or, notation;and b) reproduction in a different material form.

A computer system may include, inter alia, one or more computers and atleast a computer readable medium, allowing a computer system, to readdata, instructions, messages or message packets, and other computerreadable information from the computer readable medium. The computerreadable medium may include non-volatile memory, such as ROM, Flashmemory, Disk drive memory, CD-ROM, and other permanent storage.Additionally, a computer readable medium may include, for example,volatile storage such as RAM, buffers, cache memory, and networkcircuits. Furthermore, the computer readable medium may comprisecomputer readable information in a transitory state medium such as anetwork link and/or a network interface, including a wired network or awireless network, that allow a computer system to read such computerreadable information.

FIG. 16 is a block diagram of a computer system useful for implementingan embodiment of the present invention. The computer system includes oneor more processors, such as processor 1604. The processor 1604 isconnected to a communication infrastructure 1602 (e.g., a communicationsbus, cross-over bar, or network). Various software embodiments aredescribed in terms of this exemplary computer system. After reading thisdescription, it will become apparent to a person of ordinary skill inthe relevant art(s) how to implement the invention using other computersystems and/or computer architectures.

The computer system can include a display interface 1608 that forwardsgraphics, text, and other data from the communication infrastructure1602 (or from a frame buffer not shown) for display on the display unit1610. The computer system also includes a main memory 1606, preferablyrandom access memory (RAM), and may also include a secondary memory1612. The secondary memory 1612 may include, for example, a hard diskdrive 1614 and/or a removable storage drive 1616, representing a floppydisk drive, a magnetic tape drive, an optical disk drive, etc. Theremovable storage drive 1616 reads from and/or writes to a removablestorage unit 1618 in a manner well known to those having ordinary skillin the art. Removable storage unit 1618, represents a floppy disk,magnetic tape, optical disk, etc. which is read by and written to byremovable storage drive 1616. As will be appreciated, the removablestorage unit 1618 includes a computer usable storage medium havingstored therein computer software and/or data.

In alternative embodiments, the secondary memory 1612 may include othersimilar means for allowing computer programs or other instructions to beloaded into the computer system. Such means may include, for example, aremovable storage unit 1622 and an interface 1620. Examples of such mayinclude a program cartridge and cartridge interface (such as that foundin video game devices), a removable memory chip (such as an EPROM, orPROM) and associated socket, and other removable storage units 1622 andinterfaces 1620 which allow software and data to be transferred from theremovable storage unit 1622 to the computer system.

The computer system may also include a communications interface 1624.Communications interface 1624 allows software and data to be transferredbetween the computer system and external devices. Examples ofcommunications interface 1624 may include a modem, a network interface(such as an Ethernet card), a communications port, a PCMCIA slot andcard, etc. Software and data transferred via communications interface1624 are in the form of signals which may be, for example, electronic,electromagnetic, optical, or other signals capable of being received bycommunications interface 1624. These signals are provided tocommunications interface 1624 via a communications path (i.e., channel)1626. This channel 1626 carries signals and may be implemented usingwire or cable, fiber optics, a phone line, a cellular phone link, an RFlink, and/or other communications channels.

In this document, the terms “computer program medium,” “computer usablemedium,” and “computer readable medium” are used to generally refer tomedia such as main memory 1606 and secondary memory 1612, removablestorage drive 1616, a hard disk installed in hard disk drive 1614, andsignals. These computer program products are means for providingsoftware to the computer system. The computer readable medium allows thecomputer system to read data, instructions, messages or message packets,and other computer readable information from the computer readablemedium. The computer readable medium, for example, may includenon-volatile memory, such as Floppy, ROM, Flash memory, Disk drivememory, CD-ROM, and other permanent storage. It is useful, for example,for transporting information, such as data and computer instructions,between computer systems. Furthermore, the computer readable medium maycomprise computer readable information in a transitory state medium suchas a network link and/or a network interface, including a wired networkor a wireless network, that allow a computer to read such computerreadable information.

Computer programs (also called computer control logic) are stored in mammemory 1606 and/or secondary memory 1612. Computer programs may also bereceived via communications interface 1624. Such computer programs, whenexecuted, enable the computer system to perform the features of thepresent invention as discussed herein. In particular, the computerprograms, when executed, enable the processor 1604 to perform thefeatures of the computer system. Accordingly, such computer programsrepresent controllers of the computer system.

Although specific embodiments of the invention have been disclosed,those having ordinary skill in the art will understand that changes canbe made to the specific embodiments without departing from the spiritand scope of the invention. The scope of the invention is not to berestricted, therefore, to the specific embodiments. Furthermore, it isintended that the appended claims cover any and all such applications,modifications, and embodiments within the scope of the presentinvention.

What is claimed is:
 1. A method, implemented on a machine having atleast one processor, storage, and a communication platform, forproviding statistics characterizing translation work in synchronizingcontent in different languages, comprising: receiving a request from auser for accessing content hosted on a website in a first language,wherein the user requests to view the content in a second language andat least some of the content in the first language has previously beentranslated into the second language; obtaining the content in the firstlanguage from the website via a publicly accessible network path basedon the request; parsing the obtained content in the first language intoa plurality of translatable components; accessing a database that storesthe content in the second language previously translated as translatedcomponents; identifying at least some of the plurality of translatablecomponents that do not have a corresponding translated component in thedatabase; generating statistics based on the at least some of thetranslatable components to estimate the work load involved intranslation of the at least some of the translatable components from thefirst language to the second language; and providing the statistics tocharacterize a service related to synchronizing the content in the firstand second languages.
 2. The method according to claim 1, wherein thetranslation includes human translating the at least some of theplurality of translatable components.
 3. The method according to claim1, further comprising adding the at least some of the plurality oftranslatable components to a translation list for translation into thesecond language.
 4. The method according to claim 1, further comprisinggenerating an identifier for each of the plurality of translatablecomponents such that each of the plurality of translatable components isaccessible via a corresponding identifier.
 5. The method according toclaim 4, wherein the identifier for a text segment is generated using atleast one of a hash code, a checksum, and a mathematical algorithm basedon one or more text segments.
 6. The method according to claim 1,wherein the statistics includes at least one of a file count, a pagecount, a text segment count, a unique text segment count, a word count,and a unique word count.
 7. The method of claim 1, wherein thegenerating comprises: computing the statistics based on informationassociated with any of the at least some of the plurality oftranslatable components that do not have a corresponding translatedcomponent in the second language.
 8. A machine readable non-transitorymedium having information stored thereon for providing statisticscharacterizing translation work in synchronizing content in differentlanguages, wherein the information, when read, causes the machine toperform the following: receiving a request from a user for accessingcontent hosted on a website in a first language, wherein the userrequests to view the content in a second language and at least some ofthe content in the first language has previously been translated intothe second language; obtaining the content in the first language fromthe website via a publicly accessible network path based on the request;parsing the obtained content in the first language into a plurality oftranslatable components; accessing a database that stores the content inthe second language previously translated as translated components;identifying at least some of the plurality of translatable componentsthat do not have a corresponding translated component in the database;generating statistics based on the at least some of the translatablecomponents to estimate the work load involved in translation of the atleast some of the translatable components from the first language to thesecond language; and providing the statistics to characterize a servicerelated to synchronizing the content in the first and second languages.9. The medium according to claim 8, wherein the translation includeshuman translating the at least some of the plurality of translatablecomponents.
 10. The medium according to claim 8, wherein theinformation, when read, further causes the machine to perform thefollowing: adding the at least some of the plurality of translatablecomponents to a translation list for translation into the secondlanguage.
 11. The medium according to claim 8, wherein the information,when read, further causes the machine to perform the following:generating an identifier for each of the plurality of translatablecomponents such that each of the plurality of translatable components isaccessible via a corresponding identifier.
 12. The medium according toclaim 11, wherein the identifier for a text segment is generated usingat least one of a hash code, a checksum, and a mathematical algorithmbased on one or more text segments.
 13. The medium according to claim 8,wherein the statistics includes at least one of a file count, a pagecount, a text segment count, a unique text segment count, a word count,and a unique word count.
 14. The medium according to claim 8, whereinthe generating comprises: computing the statistics based on informationassociated with any of the at least some of the plurality oftranslatable components that do not have a corresponding translatedcomponent in the second language.
 15. A system having at least oneprocessor, storage, and a communication platform, for providingstatistics characterizing translation work in synchronizing content indifferent languages, wherein the at least one processor is configuredfor: receiving a request from a user for accessing content hosted on awebsite in a first language, wherein the user requests to view thecontent in a second language and at least some of the content in thefirst language has previously been translated into the second language;obtaining the content in the first language from the website via apublicly accessible network path based on the request; parsing theobtained content in the first language into a plurality of translatablecomponents; accessing a database that stores the content in the secondlanguage previously translated as translated components; identifying atleast some of the plurality of translatable components that do not havea corresponding translated component in the database; generatingstatistics based on the at least some of the translatable components toestimate the work load involved in translation of the at least some ofthe translatable components from the first language to the secondlanguage; and providing the statistics to characterize a service relatedto synchronizing the content in the first and second languages.